{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "colab": {
      "name": "NativeCudaLoadLinux",
      "provenance": [],
      "collapsed_sections": [],
      "include_colab_link": true
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/DiffSharp/DiffSharp/blob/dev/notebooks/debug/NativeCudaLoadLinux.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rhwrF74AUvvB"
      },
      "source": [
        "**Important note:** You should always work on a duplicate of the course notebook. On the page you used to open this, tick the box next to the name of the notebook and click duplicate to easily create a new version of this notebook.\n",
        "\n",
        "You will get errors each time you try to update your course repository if you don't do this, and your changes will end up being erased by the original course version."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vcUKR0zOKLV5"
      },
      "source": [
        "# Debugging CUDA package problems on Linux using CUDA-enabled machines on Google CoLab\n",
        "\n",
        "Google Colab offers free CUDA-enabled GPU machines which are very useful for debugging problems with CUDA packages on Linux.\n",
        "\n",
        "THis notebook started with the investigations in https://github.com/xamarin/TorchSharp/issues/169 and is being kept for future times we need to investigate similar failures\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jrVHbROvKiQJ"
      },
      "source": [
        "### Investigate GLIB and GLIBCXX dependencies available on this system\n",
        "\n",
        "One reason Linux native binaries fail to load is when they have been built on a later Linux system, e.g. built on Ubuntu 20.04 and you're trying to load on Ubuntu 18.04.  This can even happen 18.04 (Build VM) to 18.04 (CoLab).  One particular pair of dependencies is \"GLIB\" (libc) and \"GLIBCXX\" (libstdc++).  Failures for these cause messages like \"GLIBCXX_3.4.14 missing\" - this is a version symbol in the native binaries.\n",
        "\n",
        "Do it's important to determine the maximum GLIB and GLIBCXX symbols available on both the build machine and the target machine.  Here's an example:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "SEbgqe3dKoMo"
      },
      "source": [
        "# Investigate GLIB and GLIBCXX dependencies available on this system\n",
        "!ldd --version\n",
        "!/sbin/ldconfig -p | grep stdc++\n",
        "!strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep LIBCXX\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jPezNW6wwqlP"
      },
      "source": [
        "## Install .NET SDK\n",
        "\n",
        "To run F# code on CoLab you need to install the .NET SDK on the CoLab VM:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "uPwCZKr-dAqp"
      },
      "source": [
        "# Install dotnet\n",
        "!wget https://packages.microsoft.com/config/ubuntu/18.04/packages-microsoft-prod.deb -O packages-microsoft-prod.deb && sudo dpkg -i packages-microsoft-prod.deb && sudo apt-get update && sudo apt-get install -y apt-transport-https && sudo apt-get update && sudo apt-get install -y dotnet-sdk-5.0\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "N0iT-SYGdHgP"
      },
      "source": [
        "!dotnet --version"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JQ-byfcCwmf6"
      },
      "source": [
        "## Restore packages (libtorch-cpu)\n",
        "\n",
        "Native loading for TorchSHarp.dll ultimately wants to bind to P/Invoke library \"LibTorchSharp\" which in turn has a bunch of dependencies on `torch.dll` or `libtorch.so` plus a whole lot of other things.\n",
        "\n",
        "\n",
        "First we want to acquire the packages we want to load:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "CxdQUMUDXL6U",
        "outputId": "84241b59-79a4-44c2-a4a6-ad1f92320be4"
      },
      "source": [
        "# Restore packages (libtorch-cpu)\n",
        "!echo \"printfn \\\"phase0\\\"\" > foo.fsx\n",
        "!echo \"#r \\\"nuget: TorchSharp, 0.92.52515\\\"\" >> foo.fsx\n",
        "!echo \"#r \\\"nuget: libtorch-cpu, 1.9.0.10\\\"\" >> foo.fsx\n",
        "!echo \"printfn \\\"done\\\"\" >> foo.fsx\n",
        "!cat foo.fsx\n",
        "!echo \"-------\"\n",
        "!dotnet fsi foo.fsx\n"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "printfn \"phase0\"\n",
            "#r \"nuget: TorchSharp, 0.92.52515\"\n",
            "#r \"nuget: libtorch-cpu, 1.9.0.10\"\n",
            "printfn \"done\"\n",
            "-------\n",
            "\u001b[?1h\u001b=\u001b[?1h\u001b=\u001b[6n\u001b[6n\u001b[1;1Hphase0\n",
            "\u001b[?1h\u001b=done\n",
            "\u001b[?1h\u001b="
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3ijZxZw1Lr9h"
      },
      "source": [
        "Next we look around the dependencies of libLibTorchSharp."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "An4M6BSCXm0k"
      },
      "source": [
        "# Look around packages and dependencies  (libtorch-cpu)\n",
        "! /lib/x86_64-linux-gnu/libc.so.6 --version\n",
        "!ls /root/.nuget/packages/libtorch-cpu/1.9.0.10/runtimes/linux-x64/native\n",
        "!ls /root/.nuget/packages/torchsharp/0.92.52515/runtimes/linux-x64/native/\n",
        "!echo LD_LIBRARY_PATH=$LD_LIBRARY_PATH\n",
        "!ldd --version\n",
        "!ldd  /root/.nuget/packages/torchsharp/0.92.52515/runtimes/linux-x64/native/libLibTorchSharp.so\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vn87qtuiL0i5"
      },
      "source": [
        "Next we run some F# code by creating a script and calling `TorchSharp.Torch.IsCudaAvailable`"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "fC237CqodbPX"
      },
      "source": [
        "# Run some code with workaround  (libtorch-cpu)\n",
        "!echo \"printfn \\\"phase0\\\"\" > foo.fsx\n",
        "!echo \"#r \\\"nuget: TorchSharp, 0.92.52515\\\"\" >> foo.fsx\n",
        "!echo \"#r \\\"nuget: libtorch-cpu, 1.9.0.10\\\"\" >> foo.fsx\n",
        "#!echo \"printfn \\\"phase1\\\"\" >> foo.fsx\n",
        "#!echo \"open System.Runtime.InteropServices\" >> foo.fsx\n",
        "#!echo \"NativeLibrary.Load(\\\"/root/.nuget/packages/libtorch-cpu/1.9.0.10/runtimes/linux-x64/native/libtorch.so\\\") |> printfn \\\"%A\\\"\" >> foo.fsx\n",
        "!echo \"printfn \\\"phase2\\\"\" >> foo.fsx\n",
        "!echo \"TorchSharp.Torch.IsCudaAvailable() |> printfn \\\"%A\\\"\" >> foo.fsx\n",
        "!cat foo.fsx\n",
        "!dotnet fsi foo.fsx"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-4_iuEx_L8jB"
      },
      "source": [
        "This should report \"false\" since we're using the LibTorch CPU binaries and they can't connect to the GPU resource."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FgOKh9pWwxUo"
      },
      "source": [
        "## Restore packages (libtorch-cuda)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "r61BnWwIMJjg"
      },
      "source": [
        "Next we do a similar process for libtorch-cuda-11.7-linux-x64.\n",
        "\n",
        "> This will take a long time as the packages are huge\n",
        "\n",
        "> It will be faster on the next iteration"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Hy2WYgMS5iWy",
        "outputId": "c21e3775-ffdf-4478-c325-048f0d52a56b"
      },
      "source": [
        "# Restore packages (libtorch-cuda)\n",
        "!echo \"printfn \\\"phase0\\\"\" > foo.fsx\n",
        "!echo \"#i \\\"nuget: https://donsyme.pkgs.visualstudio.com/TorchSharp/_packaging/packages2%40Local/nuget/v3/index.json\\\"\" >> foo.fsx\n",
        "!echo \"#r \\\"nuget: TorchSharp, 0.92.52515\\\";;\" >> foo.fsx\n",
        "!echo \"#r \\\"nuget: libtorch-cuda-11.7-linux-x64, 1.9.0.10\\\";;\" >> foo.fsx\n",
        "!echo \"printfn \\\"done\\\"\" >> foo.fsx\n",
        "!cat foo.fsx\n",
        "!echo \"-------\"\n",
        "!dotnet fsi foo.fsx\n"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "printfn \"phase0\"\n",
            "#i \"nuget: https://donsyme.pkgs.visualstudio.com/TorchSharp/_packaging/packages2%40Local/nuget/v3/index.json\"\n",
            "#r \"nuget: TorchSharp, 0.92.52515\";;\n",
            "#r \"nuget: libtorch-cuda-11.7-linux-x64, 1.9.0.10\";;\n",
            "printfn \"done\"\n",
            "-------\n",
            "\u001b[?1h\u001b=\u001b[?1h\u001b=\u001b[6n\u001b[6n\u001b[1;1Hphase0\n",
            "\u001b[?1h\u001b=\u001b[?1h\u001b=done\n",
            "\u001b[?1h\u001b="
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Eo5npXjbMQWJ"
      },
      "source": [
        "Next we look around the dependencies of libLibTorchSharp."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "XXNaxxL45qyZ",
        "outputId": "eacf57e6-f073-43ae-ede7-24f23648591e"
      },
      "source": [
        "# Look around packages and dependencies  (libtorch-cuda)\n",
        "! /lib/x86_64-linux-gnu/libc.so.6 --version\n",
        "!ls /root/.nuget/packages/libtorch-cuda-11.7-linux-x64/1.9.0.10/\n",
        "!echo LD_LIBRARY_PATH=$LD_LIBRARY_PATH\n",
        "!ldd --version\n",
        "!ls /root/.nuget/packages/torchsharp/0.92.52515/runtimes/linux-x64/native/\n",
        "#!ldd  /root/.nuget/packages/torchsharp/0.92.52515/runtimes/linux-x64/native/libLibTorchSharp.so\n",
        "!ls /root/.nuget/packages/torchsharp/0.92.52515/lib/net6.0/cuda-11.7/\n",
        "!ldd /root/.nuget/packages/torchsharp/0.92.52515/lib/net6.0/cuda-11.7/libLibTorchSharp.so"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "buildTransitive\n",
            "lib\n",
            "libtorch-cuda-11.7-linux-x64.1.9.0.10.nupkg\n",
            "libtorch-cuda-11.7-linux-x64.1.9.0.10.nupkg.sha512\n",
            "libtorch-cuda-11.7-linux-x64.nuspec\n",
            "LICENSE-LIBTORCH.txt\n",
            "LD_LIBRARY_PATH=/usr/lib64-nvidia\n",
            "ldd (Ubuntu GLIBC 2.27-3ubuntu1.2) 2.27\n",
            "Copyright (C) 2018 Free Software Foundation, Inc.\n",
            "This is free software; see the source for copying conditions.  There is NO\n",
            "warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\n",
            "Written by Roland McGrath and Ulrich Drepper.\n",
            "libLibTorchSharp.so\n",
            "libc10_cuda.so\t\t\t  libnvToolsExt-24de1d56.so.1\n",
            "libc10d_cuda_test.so\t\t  libprocess_group_agent.so\n",
            "libc10.so\t\t\t  libpytorch_jni.so\n",
            "libcaffe2_detectron_ops_gpu.so\t  libshm.so\n",
            "libcaffe2_module_test_dynamic.so  libtensorpipe_agent.so\n",
            "libcaffe2_nvrtc.so\t\t  libtorchbind_test.so\n",
            "libcaffe2_observers.so\t\t  libtorch_cpu.so\n",
            "libcudart-6d56b25a.so.11.0\t  libtorch_cuda_cpp.so\n",
            "libfbjni.so\t\t\t  libtorch_cuda_cu.so\n",
            "libgomp-7c85b1e2.so.1\t\t  libtorch_cuda.so\n",
            "libjitbackend_test.so\t\t  libtorch_global_deps.so\n",
            "libLibTorchSharp.so\t\t  libtorch_python.so\n",
            "libnvrtc-3a20f2b6.so.11.3\t  libtorch.so\n",
            "libnvrtc-builtins.so\n",
            "\tlinux-vdso.so.1 (0x00007ffc941eb000)\n",
            "\t/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4 (0x00007fc2df705000)\n",
            "\tlibtorch.so => /root/.nuget/packages/torchsharp/0.92.52515/lib/net6.0/cuda-11.7/libtorch.so (0x00007fc2df503000)\n",
            "\tlibc10.so => /root/.nuget/packages/torchsharp/0.92.52515/lib/net6.0/cuda-11.7/libc10.so (0x00007fc2df26c000)\n",
            "\tlibtorch_cpu.so => /root/.nuget/packages/torchsharp/0.92.52515/lib/net6.0/cuda-11.7/libtorch_cpu.so (0x00007fc2ccdfc000)\n",
            "\tlibpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc2ccbdd000)\n",
            "\tlibstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc2cc854000)\n",
            "\tlibm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc2cc4b6000)\n",
            "\tlibgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc2cc29e000)\n",
            "\tlibc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc2cbead000)\n",
            "\t/lib64/ld-linux-x86-64.so.2 (0x00007fc2dfd6a000)\n",
            "\tlibunwind.so.8 => /usr/lib/x86_64-linux-gnu/libunwind.so.8 (0x00007fc2cbc92000)\n",
            "\tlibtorch_cuda.so => /root/.nuget/packages/torchsharp/0.92.52515/lib/net6.0/cuda-11.7/libtorch_cuda.so (0x00007fc2bde7b000)\n",
            "\tlibtorch_cuda_cu.so => /root/.nuget/packages/torchsharp/0.92.52515/lib/net6.0/cuda-11.7/libtorch_cuda_cu.so (0x00007fc27e2a0000)\n",
            "\tlibtorch_cuda_cpp.so => /root/.nuget/packages/torchsharp/0.92.52515/lib/net6.0/cuda-11.7/libtorch_cuda_cpp.so (0x00007fc20b85f000)\n",
            "\tlibgomp-7c85b1e2.so.1 => /root/.nuget/packages/torchsharp/0.92.52515/lib/net6.0/cuda-11.7/libgomp-7c85b1e2.so.1 (0x00007fc20b635000)\n",
            "\tlibrt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fc20b42d000)\n",
            "\tlibdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc20b229000)\n",
            "\tlibcudart-6d56b25a.so.11.0 => /root/.nuget/packages/torchsharp/0.92.52515/lib/net6.0/cuda-11.7/libcudart-6d56b25a.so.11.0 (0x00007fc20afa0000)\n",
            "\tliblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007fc20ad7a000)\n",
            "\tlibc10_cuda.so => /root/.nuget/packages/torchsharp/0.92.52515/lib/net6.0/cuda-11.7/libc10_cuda.so (0x00007fc20ab4a000)\n",
            "\tlibnvToolsExt-24de1d56.so.1 => /root/.nuget/packages/torchsharp/0.92.52515/lib/net6.0/cuda-11.7/libnvToolsExt-24de1d56.so.1 (0x00007fc20a940000)\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "fRar4V-ZMfyf"
      },
      "source": [
        "# Run some code with workaround  (libtorch-cpu)\n",
        "!echo \"printfn \\\"phase0\\\"\" > foo.fsx\n",
        "!echo \"#r \\\"nuget: TorchSharp, 0.92.52515\\\"\" >> foo.fsx\n",
        "!echo \"#r \\\"nuget: libtorch-cuda-11.7-linux-x64, 1.9.0.10\\\"\" >> foo.fsx\n",
        "#!echo \"printfn \\\"phase1\\\"\" >> foo.fsx\n",
        "#!echo \"open System.Runtime.InteropServices\" >> foo.fsx\n",
        "#!echo \"NativeLibrary.Load(\\\"/root/.nuget/packages/libtorch-cpu/1.9.0.7/runtimes/linux-x64/native/libtorch.so\\\") |> printfn \\\"%A\\\"\" >> foo.fsx\n",
        "!echo \"printfn \\\"phase2\\\"\" >> foo.fsx\n",
        "!echo \"TorchSharp.Torch.IsCudaAvailable() |> printfn \\\"%A\\\"\" >> foo.fsx\n",
        "!cat foo.fsx\n",
        "!dotnet fsi foo.fsx"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XD0jCHOUMWCI"
      },
      "source": [
        "This should report \"true\" since we're using the LibTorch CPU binaries and they can't connect to the GPU resource."
      ]
    }
  ]
}